131 research outputs found

    Assessment Method for a Power Analysis to Identify Differentially Expressed Pathways

    Get PDF
    Gene expression data can provide a very rich source of information for elucidating the biological function on the pathway level if the experimental design considers the needs of the statistical analysis methods. The purpose of this paper is to provide a comparative analysis of statistical methods for detecting the differentially expression of pathways (DEP). In contrast to many other studies conducted so far, we use three novel simulation types, producing a more realistic correlation structure than previous simulation methods. This includes also the generation of surrogate data from two large-scale microarray experiments from prostate cancer and ALL. As a result from our comprehensive analysis of [Image: see text] parameter configurations, we find that each method should only be applied if certain conditions of the data from a pathway are met. Further, we provide method-specific estimates for the optimal sample size for microarray experiments aiming to identify DEP in order to avoid an underpowered design. Our study highlights the sensitivity of the studied methods on the parameters of the system

    Constrained Covariance Matrices With a Biologically Realistic Structure: Comparison of Methods for Generating High-Dimensional Gaussian Graphical Models

    Get PDF
    High-dimensional data from molecular biology possess an intricate correlation structure that is imposed by the molecular interactions between genes and their products forming various different types of gene networks. This fact is particularly well-known for gene expression data, because there is a sufficient number of large-scale data sets available that are amenable for a sensible statistical analysis confirming this assertion. The purpose of this paper is two fold. First, we investigate three methods for generating constrained covariance matrices with a biologically realistic structure. Such covariance matrices are playing a pivotal role in designing novel statistical methods for high-dimensional biological data, because they allow to define Gaussian graphical models (GGM) for the simulation of realistic data; including their correlation structure. We study local and global characteristics of these covariance matrices, and derived concentration/partial correlation matrices. Second, we connect these results, obtained from a probabilistic perspective, to statistical results of studies aiming to estimate gene regulatory networks from biological data. This connection allows to shed light on the well-known heterogeneity of statistical estimation methods for inferring gene regulatory networks and provides an explanation for the difficulties inferring molecular interactions between highly connected genes

    Graph-based exploitation of gene ontology using GOxploreR for scrutinizing biological significance.

    Get PDF
    Gene ontology (GO) is an eminent knowledge base frequently used for providing biological interpretations for the analysis of genes or gene sets from biological, medical and clinical problems. Unfortunately, the interpretation of such results is challenging due to the large number of GO terms, their hierarchical and connected organization as directed acyclic graphs (DAGs) and the lack of tools allowing to exploit this structural information explicitly. For this reason, we developed the R package GOxploreR. The main features of GOxploreR are (I) easy and direct access to structural features of GO, (II) structure-based ranking of GO-terms, (III) mapping to reduced GO-DAGs including visualization capabilities and (IV) prioritizing of GO-terms. The underlying idea of GOxploreR is to exploit a graph-theoretical perspective of GO as manifested by its DAG-structure and the containing hierarchy levels for cumulating semantic information. That means all these features enhance the utilization of structural information of GO and complement existing analysis tools. Overall, GOxploreR provides exploratory as well as confirmatory tools for complementing any kind of analysis resulting in a list of GO-terms, e.g., from differentially expressed genes or gene sets, GWAS or biomarkers. Our R package GOxploreR is freely available from CRAN

    Analyzing the Scholarly Literature of Digital Twin Research : Trends, Topics and Structure

    Get PDF
    Currently, studies involving a digital twin are gaining widespread interest. While the first fields adopting such a concept were in manufacturing and engineering, lately, interest extends also beyond these fields across all academic disciplines. Given the inviting idea behind a digital twin which allows the efficient exploitation and utilization of simulations such a trend is understandable. The purpose of this paper is to use a scientometrics approach to study the early publication history of the digital twin across academia. Our analysis is based on large-scale bibliographic and citation data from Scopus that provides authoritative information about high-quality publications in essentially all fields of science, engineering and humanities. This paper has four major objectives. First, we obtain a global overview of all publications related to a digital twin across all major subject areas. This analysis provides insights into the structure of the entire publication corpus. Second, we investigate the co-occurrence of subject areas appearing together on publications. This reveals interdisciplinary relations of the publications and identifies the most collaborative fields. Third, we conduct a trend and keyword analysis to gain insights into the evolution of the concept and the importance of keywords. Fourth, based on results from topic modeling using a Latent Dirichlet Allocation (LDA) model we introduce the definition of a scientometric dimension (SD) of digital twin research that allows to summarize an important aspect of the bound diversity of the academic literature.Peer reviewe

    An intercalation-locked parallel-stranded DNA tetraplex

    Get PDF
    Funding for Open Access provided by the UMD Libraries Open Access Publishing Fund.DNA has proved to be an excellent material for nanoscale construction because complementary DNA duplexes are programmable and structurally predictable. However, in the absence of Watson– Crick pairings, DNA can be structurally more diverse. Here, we describe the crystal structures of d(ACTCGGATGAT) and the brominated derivative, d(ACBrUCGGABrUGAT). These oligonucleotides form parallel-stranded duplexes with a crystallographically equivalent strand, resulting in the first examples of DNA crystal structures that contains four different symmetric homo base pairs. Two of the parallel-stranded duplexes are coaxially stacked in opposite directions and locked together to form a tetraplex through intercalation of the 5’-most A–A base pairs between adjacent G–G pairs in the partner duplex. The intercalation region is a new type of DNA tertiary structural motif with similarities to the i-motif. 1H–1H nuclear magnetic resonance and native gel electrophoresis confirmed the formation of a parallel-stranded duplex in solution. Finally, we modified specific nucleotide positions and added d(GAY) motifs to oligonucleotides and were readily able to obtain similar crystals. This suggests that this parallel-stranded DNA structure may be useful in the rational design of DNA crystals and nanostructures

    Identifying key interactions between process variables of different material categories using mutual information-based network inference method

    Get PDF
    This paper analyzes production data from injection molding processes to identify key interactions between the process variables from different material categories using the network inference method called "bagging conservative causal core network" (BC3net). This approach is an ensemble method with mutual information that is measured between process variables to select pairs that show significant shared information. We construct networks for different time intervals and aggregate them by calculating the proportion of significant pairs of process variables (weighted edges) for each production process over time. The weighted edges of the aggregated network for each product are used in a machine learning model to optimize the network interval size (interval split) and feature selection, where edge weights are the input features and material categories are the output classification labels. The time intervals are optimized based on the classification accuracy of the machine learning model. Our analysis shows that the aggregated edge features of inferred networks can classify different material categories and identify critical features that represent interdependence in the associated process variables. We further used the "one vs. other" labels for the machine learning models to identify material-specific interactions for each material category. Additionally, we constructed an aggregated network over all samples in which the process variable interactions were steady over time. The resulting network showed modular characteristics where process variables of similar categories were grouped in the same community.publishedVersionPeer reviewe
    • …
    corecore